Detecting Japanese Term Variation in Textual Corpus
نویسندگان
چکیده
In this paper, we describe a rule-based mechanism that detects Japanese term variations from textual corpora. The system operates on the basis of meta-rules that map syntactic and morpho-syntactic variations of terms to the original forms of terms. The framework used here has been successfully applied in such languages as English and French, and we show here that this also works well in detecting Japanese term variants, once we properly take into account speci c characteristics of Japanese language. We also discuss the potential of this work for IR related applications.
منابع مشابه
A Linguistic and Mathematical Method for Mapping Thematic Trends from Texts
We present a novel method for mapping thematic trends called "Classification by Preferential Clustered Link" (CPCL). This method clusters relevant textual units (terms) from a corpus of texts, based on meaningful linguistic relations (syntactic variations) identified amongst the units. Terms related through syntactic variations are represented in the form of a graph and are first clustered into...
متن کاملDetecting Term Relationships to Improve Textual Document Sanitization
Nowadays, the publication of textual documents provides critical benefits to scientific research and business scenarios where information analysis plays an essential role. Nevertheless, the possible existence of identifying or confidential data in this kind of documents motivates the use of measures to sanitize sensitive information before being published, while keeping the innocuous data unmod...
متن کاملTerminology-driven Augmentation of Bilingual Terminologies
This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...
متن کاملFlexiTerm: a flexible term recognition method
BACKGROUND The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient...
متن کاملTextual Enhancement across Linguistic Structures: EFL Learners' Acquisition of English Forms
The benefits of textual input enhancement in the acquisition of linguistic forms have produced mixed results in SLA literature. The present study investigates the effects of textual enhancement on adult foreign language intake of two English linguistic forms-subjunctive mood and inversion structures-to explore the role of the type of linguistic items in input enhancement studies. It also invest...
متن کامل